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SUMMARY 

A  fully  Bayesian  analysis  of  linear  and  nonlinear  population  models  has 
previously  been  unavailable,  as  a  consequence  of  the  seeming  impossibility 
of  performing  the  necessary  numerical  integrations  in  the  complex  multi- 
parameter  structures  typically  arising  in  such  models.  It  is  demonstrated 
that,  for  a  variety  of  linear  and  ix>nlinear  population  models,  a  fully 
Bayesian  analysis  can  be  Implemented  in  a  straightforward  manner  using  the 
Gibbs  sampler.  The  approach  is  illustrated  with  examples  involving 
challenging  problems  of  outliers  and  mean-variance  relationships  in  population 
modelling. 


Keywords:  Population  models;  hierarchical  models;  Bayesian  analysis; 

Gibbs  sampler;  population  outliers;  nonlineEu*  Bx>dels;  mean-variance 
relationships.  ed 
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WTROOUCnON 


1. 1  Population  Models 

Population  Models  are  widely  used  In  blonetrical  growth  analysis  (see, 
for  exanple,  Berkey,  1982,  Lange.  Carlin  and  Gelf and,  1992),  In  pharaacoklnetlc 
studies  as  part  of  drug  developaent  procedures  (see,  for  exasple,  Beal  and 
Shelner,  1980,  Llndstron  and  Bates,  1990),  and  have  a  long  history  of  use  In 
ed\K;atlonal  research  (Novlck  et  al,  1972),  econometrics  (Swany,  1970)  and 
other  fields.  Related  models  are  now  Increasingly  used  for  multi-centre 
clinical  trials  (Skene  and  Uakefleld,  1990)  aiwi  for  spatial  epidemiology 
studies  (Besag  et  al,  1991). 

From  a  Bayesian  perspective,  such  models  are  variations  on  and  extensions 
of  the  following  hierarchical  structure.  Let  y  denote  the  totality  of 
measurement  data  on  I  Individuals  In  a  designated  population  (for  example,  of 
patients,  experimental  animals,  firms,  etc.);  let  6  denote  the  parameters 
defining  I  underlying  'response*  profiles  (for  example,  weight  versus  age, 
drug  concentration  versus  time  after  administration,  profits  versus  structural 
variables  defining  a  firm,  etc. )  and  f  denote  hyperparameters  defining 
relationships  among  components  of  6.  Then,  population  models  correspond  to 
the  hierarchy  of  distributions 


(y|e]  .  le\f]  .  If]  .  (l.i) 

Uhere,  adopting  the  notation  of  Gelf  and  and  Smith  (1990),  Joint,  conditional 
and  marginal  densities  for  random  quantities,  u,  v  are  denoted,  respectively, 
by  (u,  v],  Iu|v],  lul,  (v). 


In  the  context  of  (1.1),  Interest  nay  centre  on  Inference  for  components 
of  e  (J.e.  relating  to  aspects  of  specific  Individual  profiles),  or  for  f 
(i.e.  relating  to  population  characteristics),  or  on  predictions  of  future 
observations  from  an  already  included  individual  or  a  new  individiial  drawn 
from  the  sane  population.  In  all  cases,  the  Integrals  required  for  a  fully 
Bayesian  analysis  are  typically  not  available  In  closed  form  and  numerical  or 
analytic  approximation  Is  required.  Hitherto,  however,  no  approximation 
approach  has  been  found  to  be  entirely  satlstfactory.  The  p\irpose  of  this 
paper  Is  to  demonstrate  that  a  highly  effective  Bayesian  computation  strategy 
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for  general  population  sodel  analysis  Is  available,  based  on  the  Gibbs 
sampler. 

1.2  Structxire  of  the  paper 

In  Section  2,  we  consider  in  detail  two  population  model  examples  (one 
linear,  one  non-llnear),  idiich  pose  challenging  problems  going  beyond  basic 
population  analysis,  by  modelling  and  analyzing  population  outliers  and  mean- 
variance  relationships.  In  Section  3,  we  provide  a  description  of  the  Gibbs 
sampler  approach  to  Bayesian  calculations  for  hierarchical  models.  In  Section 
4,  we  analyse  In  detail  the  linear  model  example,  exhibiting.  In  particular,  a 
method  for  detecting  population  outliers.  In  Section  5,  we  analyse  In  detail 
the  nonlinear  model  example.  We  show.  In  particular,  that  the  Inclusion  of 
mean-varlEUice  relatlonshlf>s  causes  little  additional  computational  difficulty 
with  the  Gibbs  sampler  approach.  The  key  message  in  both  Sections  4  and  5  is 
that  the  seemingly  Intractable  calculations  associated  with  the  Bayesian 
analysis  of  population  models  do  Indeed  become  relatively  stralghtforwaixl 
\inder  the  Gibbs  sampler  approach.  In  Section  6,  we  put  the  Gibbs  Sampler 
approach  in  perspective  by  commenting  briefly  on  other  available  alternatives. 


2  I1JLU5TR4TIVE  EXAMPLES 

2.1  A  linear  population  biological  growth  example 

Table  1  records  dental  measurements  of  the  distance  (In  mm)  from  the 
centre  of  the  pituitary  to  the  pteryo-maxlllary  fissure  in  11  girls  and  16 
boys  at  the  ages  of  8,  10,  12  and  14  years.  Both  in  the  original  analysis 
(Potthoff  and  Roy,  1964)  and  In  a  Bayesian  reanalysls  (Feam,  1975),  a  linear 
growth  relationship  between  the  dental  measurement  and  age  was  assumed.  This 
Is  also  assumed  in  our  subsequent  analysis,  together  with  homoscedastlc  normal 
errors  within  each  separate  population  (girls,  boys).  Let  x^j,  denote, 
respectively,  the  Jtti  time  point  (using  age  11  as  origin)  and  associated 
measurement  on  the  Ith  Individual  (I  ■  1,...,11  for  the  population  of  girls,  j 
■  12 . 27  for  the  population  of  boys,  J  «  1,,..,4). 


Table  1  Here 
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For  both  the  girl  and  boy  populations  the  first  stage  model  of  (1.1) 
takes  the  form 

n  n  "  n  n 

“  j  ^  j 

(1.2) 

“  n  II  • 

where  t  denotes  the  common  normal  measurement  precision  (reciprocal  variance) 
and  *  (a^,  denotes  the  intercept  and  slope  for  the  ith  individual’s 
straight-line  growth  curve. 

Feam  (1975)  takes  as  the  second  stage  (population)  distribution  for  the 
e^’s  a  bivariate  normal  distribution  (separately,  for  eEu;h  of  the  girl  and  boy 
populations),  so  that 


II  *  I^  P  • 

with  f  *  (ji,  I),  »rtiere  ■  (pj,  £(a^)  «  Pj,  Eipp  «  p^,  so  that  the 

individ\ial  straight-line  growth  curves  are,  in  effect,  regarded  as 
distributed  around  a  ’mean’  population  growth  curve,  p^  +  p^z  with  population 
variation  described  by  the  2x2  covariance  matrix  Z.  (We  digress  to  note 
that  since  the  are  positive  in  this  case  it  might  be  more  reasonable  to 
assume  log  to  be  normally  distributed.  Such  a  refinement  is  not,  in  fact, 
important  in  this  example  and  so  we  shall  not  pursue  it,  in  order  to  keep  this 
initial  exposition  as  simple  as  possible.  We  shall  illustrate  such  a 
transformation  in  our  second,  nonlinear,  example. )  In  >diat  follows,  we  shall 
denote  p^  by  (a^j  for  the  girl  (boy)  populations  and  p^  by 

Since  information  from  individuals  within  each  population  is  effectively 
‘pooled’  to  give  population  ’mean’  inferences,  it  can  be  important  in  such 
studies  to  guard  against  an  aberrant  or  ’outlying*  individual  unduly 
infliiencing  the  population  Inference.  Proceeding  naively,  examining,  for 
example,  the  pooled  population  of  girls  and  boys,  one  night  plot  the  least- 
squares  estimates  of  intercepts  and  slopes,  as  in  Figure  1.  Should  one 
conclude  from  the  plot  that  Um  boy  labelled  24  is  a  ’slope  outlier’?  Or  that 
the  boy  labelled  21  is  an  ’intercept  outlier’?  We  seek  a  modelling  analysis 
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strategy  which  will  -provide  both  a  coherent  outlier  detection  diaignostic  and 
direct  inferences  vdiich  accommodate  the  effect  of  any  outliers  present. 


Figure  1  Here 


The  strategy  we  shall  adopt  is  to  replace  the  population  bivariate  normal 
assumption  for  the  e^’s  by  a  bi'/ariate  Student-t  assumption  (see,  for  example. 
Smith,  1983,  0’ Hagan,  1987,  for  general  discussions  of  modelling  with  heavy¬ 
tailed  distributions).  The  hierarchical  model  is  then  completed  by  assigning 
priors  to  T,  p  and  £.  As  we  shall  show  in  Section  4,  analysis  of  this  model 
(perhaps  surprisingly)  is  still  easily  implemented  via  the  Gibbs  sampler  and 
provides  a  novel  fora  of  graphical  diagnostic  for  second-stage  outliers  in 
hiersLTChical  models. 

The  main  inference  questions  in  this  study  relate  to  differences  in 
growth  between  the  girl  and  boy  populations.  Vfe  shall  provide  illustrative 
analyses  of  this  in  Section  4.  taking  into  account  the  outlier  issue  disciissed 
above. 

2.2  A  nonlinear  population  pharmacokinetic  example 

Table  2  presents  pharmacokinetic  data  on  tlM  plasma  concentration  of  the 
drug  Cadralazine  in  10  cardiac  failure  patients  at  veu'ious  times  after  the 
administration  of  a  single  dose  of  30mg. 


Table  2  Here 


The  starting  point  for  modelling  the  first  stage  of  the  hierarchy  in  this 
case  is  the  one-compartment  nonlinear  model  for  Individual  plasma 
concentrations  [y^j)  against  tine  [x^j)  (see  Racine  et  ai,  1986),  tdilch 
implies  that 


plasma  concentration  ■  30  x  exp(-8j  X  time)  ,  (2.1) 

where  a^,  (>  0)  are,  respectively,  the  volume  of  distribution  and 

elimination  rate  for  individual  i,  i  *  1,..,10. 
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Mettsureaent  variance  is  certainly  related  to  underlying  concentration 
level  in  studies  such  as  this,  so  that  a  sinple  additive  hoaoscedastic  noraal 
error  assuaption  for  the  first  stage  distribution  is  inappropriate.  We  shall 
illiistrate  possible  aodels  and  their  subsequent  analyses  by  considering  the 
following  three  intra-individiial  error  structures. 

As  a  first  possibility,  letting  (a^,  and  denoting  the  right- 
hand-side  of  (2.1)  by  ^  assuae  that 

10*  y,j  -  log  *  Cy  . 

with  independent  normal  errors  having  zero  mean  and  constant  variance 
A  second  modelling  possibility  is  to  assume  that 


>'ij  ■  V*i’  *  'ij  • 


with  independent  noraal  errors  having  zero  mean  and  variances  given  by 


(2.2) 


so  that  7  a  0  indexes  a  power  law  relationship  between  the  variance  and  the 
mean.  We  shall  refer  to  these  two  variance  aodels  as  the  lognormal  model  and 
the  power  model. 


In  Section  5  we  shall  show,  in  fact,  that  neither  of  these  formulations 
is  adequate  for  the  Cadralazine  data.  We  now  describe  a  third,  more  complex 
error  model. 


Careful  study  of  data  resulting  from  the  analytical  assay  technique 
revealed  that  the  variance  became  approximately  constant  for  low 
concentrations,  but  increased  as  a  function  of  the  mean  for  larger 
concentrations.  To  model  this  behaviour  directly  would  require  a  ‘cut-off’ 
point  for  concentrations,  below  Uhlch  the  variance  was  assumed  constant  and 
above  vhlch  the  power  model  was  used.  Such  a  model  requires  additional 
parameters,  so  instead  the  power  model  (2.2)  was  used,  but  it  was  assumed  that 


the  error  distribution  is  a  truncated  noraal  distribution  with 
model  reproduces  the  correct  behaviour  and  also  ensures  that 


lo^lctlve 


This 
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distributions  will  always  produce  non-negative  concentrations. 


For  the  (population)  second  stage  of  the  hierarchical  aodel  we  define 
e.  «  (log  a,,  log  ^,)  and  assume  that  the  the  e.*s  follow  a  bivariate  noraal 

-•i  i  i 

distribution  with  aean  p  *  (p^,  p^)  vtd  covariance  matrix  £.  The  hierarchical 
aodel  is  then  completed  by  assigning  priors  to  the  elements  of  t,  fi,  £  and, 
for  the  second  and  third  of  the  above  models,  y.  We  shall  show  in  Section  5 
that  the  Gibbs  sampler  again  permits  relatively  straightforward 
implementation,  despite  the  complications  of  nonlinearity,  mean-variance 
relationships  and  parameter  transformations  in  specifying  a  population 
distribution. 

In  pharmacokinetic  studies  the  main  questions  relate  to  population 
inferences  and/or  inferences  for  future  concentration  levels.  Though  not 
relevant  to  our  particular  example,  the  former  question  may  relate  to  the 
identification  of  important  covariates  such  as  age.  weight  and  sex.  Interest 
nay  focus  on  nonlinear  functions  of  the  population  parameters,  for  example, 
the  so-called  clearance  parameter  (defined  for  an  individual  by  nnd  the 

elimination  half-life  (defined  for  an  individual  by  logZ/fi^.  Issues  arise 
here  concerning  the  best  summaries  of  the  84;>propriate  population  analogue 
(mean?  node?  median?  quantiles?).  In  fact,  from  a  Bayesian  perspective  the 
single  most  relevant  summary  will  often  be  the  predictive  distribution  (for 
example;'  of  the  half-life)  for  a  new  individual  from  the  population,  or  from  a 
subpopulation  (typically  conditioning  on  covariates).  A  predictive 
distribution  for  concentration  can,  in  particular,  enable  the  benefits  of  a 
candidate  dosage  regimen  to  be  investigated. 


3.  THE  GIBBS  SAMPLER 


Suppose  that  the  Joint  probability  structure  for  a  collection  of  random 
variables  (7^, . . .  .(/^  is  sixrh  that  the  Joint  density  [Uy  . . .  is  uniquely 
determined  by  the  full  conditional  densities  C(7_|I7_.  res],  s  >  l,...,k. 
Suppose  that  samples  of  can  be  generated  efficiently  from  res] 

given  specified  values  of  the  conditioning  variables,  res.  An  algorithm 
for  extracting  information  from  these  full  conditional  distributions  in  order 

to  estimate  the  marginal  distributions,  ({/  ].  s  >  1 . k,  has  been  discussed 

by  Hastings  (1970)  and  Geman  and  Geman  (1984).  This  so-called  Gibbs  sampler 
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algoritha,  further  developed  and  illustrated  In  Gelfand  and  Smith  (1990)  and 
Gelfand  et  p'  (1990).  Is  a  Markovian  updating  scheme  \diich  proceeds  as 
follows. 


Given  arbitrary  staring  values,  for  the  k  random 

variables,  we  generate  a  random  variate  from  (I/.  ,  ,1/^®^],  followed 

by  I/‘‘>rro.  C . b'‘”i.  ^  on  up  to%< »  frot 

^(1)  ill  ^  ^ 

[(/.  |(/i  '].  This  completes  one  Iteration  of  the  sampling  scheme. 

After  t  such  Iterations,  we  would  arrive  at  a  Joint  sample  (1/^  . ). 

As  t  -»  «,  Geman  and  Geman  show  (under  rather  mild  regularity  conditions)  that 
this  tends  In  distribution  to  a  variable  having  the  Joint  distribution 

[{/^. . . .  ,U^].  Suppose  that  we  have  m  Independent  realizations  of  [I/^ . 

Such  realizations  could  arise  from  the  relpllcatlon  of  the  iterative  cycle  m 
times  or  from  the  use  of  a  smaller  number  of  such  cycles  from  idilch,  for  t 
sufficiently  large,  realizations  are  extracted  a  number  of  Iterations  apart, 
the  gap  being  large  enough  to  ensure  the  ‘Independence*  of  the  subsequent 
samples.  See  for  example,  Raftery  and  Lewis  (1992).  Ue  note  that  the 
parameterization  of  the  model  Is  relevant  to  the  choice  of  Gibbs  strategy. 

See  Hills  and  Smith  (1992)  for  general  comment,  Uedcefleld  (1992)  for  comments 
specific  to  the  application  In  Section  5.  The  replicate  values  for 

•  •  •  • _ •  ■•y*  c®*'  **  regarded  as  a  sample  of  size  a  from  tU  ].  Based 

S  Si  Sm  S 

on  the  a  lid  k*tuples  ((/.  . . J  •  l,...,a,  the  marginal  density  [(/  ] 

IJ  KJ  S 

can  be  approximated  by  the  finite  mixture  density 


-  "rr 


If  the  right-hand  side  Is  explicitly  available.  Alternatively,  an  estimate 


[U  ]  could  be  obtained  using  a  kernel  density  estimate  based  on  U . 

s  ^  '  si 

Moreov«*,  If  Interest  centres  on  the  marginal  distribution  for  a  random 


U  . 
sm 


variable  V  •  glUy.. '  '^jc^*  arbitrary  function  g,  we  note  that 

evaluation  of  g^V^j . *  l.-->ta.  directly  provides  a  sample 

. V^,  so  that  [K]  is  Immediately  available  from  a  kernel  density 

estimate.  See  Gelfand  and  Smith,  1990,  for  further  discussion. 


C 


T\imlng  specifically  now  to  Bayesian  applications,  suppose  that  ^  « 

is  a  parameter  vector  of  interest  and  that,  given  h(f)  «  [^|data] 
we  wish  to  evaluate  [^^|data],  for  some  or  all  of  i  «  l,...,Jc.  Successively 
regarding  h(^)  as  a  fuixstion  of  for  fixed  J  *  i,  iaaediately  identifies 
functions  J  *  D,  i  «  tdiich  are  such  that 

J  •  1)  «  J  *  <**^*»1 

Froa  knowledge  of  J  ^  1)  ve  can  sample  from  J  •  1.  data!,  so 

that  the  Gibbs  sampling  algorithm  is  seen  to  provide  a  general  solution  to  the 
problem  of  calculating  marginal  densities  given  the  specification  of  a 
likelihood  and  prior. 

In  fact,  very  general  methods  (Including  ratio-of -uniforms  and  envelope 
rejection  techniques)  are  available  for  random  variate  generation,  given  only 
the  form,  up  to  proportionality,  of  the  density  function  (see,  for  example, 
Ripley,  1987).  Ue  shall  describe  a  recent  adaptation  of  the  ratio-of -uniforms 
method  in  Section  5.  If  inferences  are  only  required  in  the  form  of  summaries 
of  posterior  densities,  for  example,  moments  or  quantiles,  these  are  obtained 
in  a  straightforward  manner  from  the  samples  generated  by  the  Gibbs  sampler 
(see  Gelfand  and  Smith,  1991). 

To  examine  the  structure  of  the  Gibbs  sampler  in  the  context  of 
population  hierarchical  models,  consider  first  a  very  general  Bayesian 
hierarchical  model  having  K  stages.  In  an  obvious  notation,  the  Joint 

distribution  of  data,  Y,  and  the  parameters  . from  each  of  the  stages 

is  given  by 


IwgJ*- •  |»jjl 

Suppose  now  that  inferential  Interest  centres  on  Iw^|y],  i  ■>  1,...,K  and  that 

we  wish  to  calculate  these  using  the  Gibbs  sampler,  so  that  we  need  to  sample 

iteratively  from  the  full  conditionals  [w  iF,  w  ,  res]. 

s '  r 

Examination  of  the  hierachical  structure  (a  Harkov  random  field  with  an 
‘adjacent*  neighbourhood  system)  reveals  that 
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I«^|y,  (J^,  r  *  s] 


s  “  1  . 

2  s  s  s  K-1  , 

s  «  K  , 


Vr  Vi' 

80  that  there  is  typically  considerable  siaplification  in  the  forms  of  the 
full  conditional  distributions,  greatly  facilitating  the  implementation  of  the 
Gibbs  sampler.  In  the  following  two  sections,  we  shall  illustrate  the  power 
and  relative  simplicity  of  the  latter  by  considering  the  two  particular 
population  problems  introduced  earlier. 


4.  AHALYSIS  OF  THE  LINEAR  GROVTH  EXAMPLE 

We  begin,  in  Section  4. 1.  by  reviewing  the  case  where  the  population 
(second-stage)  distribution  is  assumed  to  be  normal,  so  that  each  population 
follows  the  hierarchical  normal-linear  model  (Lindley  and  Smith,  1972).  Ue 
then,  in  Section  4.2,  extend  this  linear  case  to  incorporate  the  outlier 
accommodation  modelling  discussed  in  Section  2. 1  by  means  of  a  Student-t 
population  distribution.  An  illustrative  analysis  of  the  data  presented  in 
Section  2.1  is  given  in  Section  4.3. 

4. 1  A  normal-linear  population  model 

To  illustrate  the  structure  (1.1)  in  the  case  of  the  normal-linear 
hierarchical  model,  suppose  that  the  first-stage  model  has  the  form 

J  "l  I 

so  that,  «  ^^il’*  ^  ^  observation  vectors  is  modelled  as 

linear  structure  X^O^,  6^,  p  x  1,  with  conditionally  independent  honoscedastic 
normal  errors  having  variances  Suppose  further  that  the  second-stage 

structure  is  given  by 

I  I 

n  ■  n  'KSjie-  p  ■ 

1-1  ■*  j-T  ^ 
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so  that  the  first-steige  regression  parameter  vectors  are  taken  to  be  a  random 
sample  from  a  ‘population  distribution',  Z),  together  with 

[t]  «  C(t|XUq,  , 

corresponding  to  an  inverse-gamma  prior  for  the  common  variance  t 

Finally,  suppose  that  the  prior  specification  is  completed  by  assximing 
Vq,  Tq  known  and  taking  the  third-stage  of  the  hierarchy  to  have  the  form 

If]  =  »  W(M|2f.  p)  , 

with  1),  C,  p  and  R  known,  and  V  denoting  a  Uishart  distribution.  The  Uish2Lrt 
distribution  is  \ised  here  for  convenience  (sampling  from  it  is  straight¬ 
forward).  The  parameters  p  and  R  are  chosen  apriori:  p  -  p  is  most  non- 
informative  in  the  sense  that  its  distribution  is  flattest;  the  matrix  R  is 
chosen  to  be  an  approximate  estimate  of  £. 

Defining  y  =  . . .  .y^),  6  «  (0^, . .  •  0  “  Oj.,  D~^  «  + 

-1  -1  -1  -1  ^ 

I  ,  F  »  IZ  *  £  •  treating  f  »  Z,  t)  as  an  unknown 

parameter  of  dimension  p(I  't-  1)  Xp(p  -f  1)  ‘f  1,  it  is  easily  seen  that  a 

Gibbs  sampler  is  defined  by  the  following  conditional  distributions: 

[e^jy,  p,  z”',  T,  0j,  J  *  1]  »  N(e^|p^(T^yj  +  up  ,  i  •  i, ...  ,i  , 

[ply.  0.  Z"\  t]  «  W(p|F(/z“^0  +  C"S),  V)  , 

iz'^ly,  0,  fi.  t]  ■  V(Z"'|[£(0j  -  p)(0j  -  fi)^  *  I  *  p)  , 

f*  if*  “  G(t|X(i;q  +  n),  +  Wq'^qI)  • 

Generation  of  random  variates  is  straightforwardly  achieved  for  the  normal  and 
gamma  distributions:  generation  for  the  Vishart  distribution  is  achieved  \ising 
the  algorithm  given  in  Odell  and  Fieveson  (1966).  See  Gelfand  et  ml  (1990) 
for  further  details  and  an  application. 
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4.2  A  Student- linear  population  model 

To  robustlfy  the  second-stage  of  the  hierarchical  aodel  given  in  Section 
2.3,  suppose  that  we  wish  to  replace  the  assumption  that  the  0^  are  a  random 
sample  from  an  lf(p,  Z)  distribution  by  the  assumption  that  they  are  a  random 
sample  from  I),  a  multivariate  Student-t  distribution  with  mean  p, 

covariance  matrix  Z  and  degrees  of  freedom  w. 

One  way  of  representing  such  an  assumption  in  the  structure  (1.2),  is  to 

take 


J  I 

-  r|  "(Bile,  • 

so  that  now  ^  ^  (p,  Z,  X^,  —  ,A^),  and  then  subsequently  to  assiime  that 
[^\  «  {p](ZllXj]...[Xjl  . 

2 

id^re  [Xj]  is  defined  by  [uX^l  =  G(vX^|Xp,  X)  («  z^).  See,  also,  Racine-Poon 
(1992).  The  remaining  hierarchical  structure  is  defined  exactly  as  in  Section 
2.3.  For  references  establishing  that  [9, Ip,  Z,  XJ  »  iV(e,|p,  X~^Z)  and  [pX,] 
«  C(pXj|Xp,  X)  generate  21  ■  Z),  see  Johnson  and  Kotz  (1972, 

Chapter  27,  §3-4). 

Defining  «  xX^X^  +  X^'\  -  z“^  £  X^  +  c"\  X  «  (Xj . X^),  the 

full  conditionals  defining  the  Gibbs  sampler  become 

I!il^  t’  ^  ^  1 . ^  • 

Iply,  e,  z"^  T,  XI  «  w(jilD(z”'j:  x^e^  ♦  f\).  u) 

?•  e*  v?i  ■  ^  ^ 

ixjy,  0,  p,  Z  \  T,  Xj,  J  m  I]  m  C(p^^|X(p  ♦  p),  X)  , 


J2 


where 


V,  »  (e,  -  ^(e.  -  fi)  *  V 

Generation  of  all  the  required  random  variates  is  straightforward,  thus 
providing  >  via  the  Gibbs  sampler  -  a  fully  Bayesian  implementation  of  the 
hierarchical  linear  model  with  Student-t  second  stage.  By  using  the  same 
device  of  scale  mixtxires  of  normals,  the  first  stage  of  the  hierarchy  could 
also  be  taken  to  be  Student-t,  thus  providing  an  analysis  robust  to  both  data 
outliers  and  outlying  individuals  in  the  population.  More  generally,  the 
device  used  here  leads  to  a  straightforward  Gibbs  sampler  implementation  for 
any  robustifying  distribution  defined  as  a  scale  mixture  of  normals  (see 
Andrews  and  Mallows,  1974,  Carlin  and  Poison,  1992). 

4.3  Illustrative  analysis 

Using  the  model  structure  outlined  in  Section  4.2,  the  data  in  Table  1 
vsls  analysed  with  a  number  of  second  stage  assumptions.  The  first  analysis 
assumed  that  the  (a,  B)-pairs  from  each  of  the  girl  and  boy  populations  arose 
from  separate  bivariate  t-distributions  with  2  degrees  of  freedom.  Ue  denote 
this  model  by  St^’  For  this  example  the  values  chosen  for  the  hyper¬ 
parameters  were 


Vo  -  0.  C-l  -  0.  p  -  2.  5  -  [  J  ®  ]  . 

The  following  Gibbs  strategy  was  used:  25  cycles  were  run  initially  for  30 
iterations  before  being  increased  to  50  cycles.  These  were  run  for  30 
iterations  also  before  being  Increased  to  100  cycles  for  100  iterations. 

One  of  the  principal  alas  of  this  illustration  is  the  detection  of 
outlying  individuals  using  the  St^  model.  The  scale  parameter  is  a  good 
global  indicator  of  outliers.  The  prior  expectation  of  is  1,  so  that  a 
value  substantially  below  1  Indicates  that  the  ith  individual  parameter  vector 
(a^,  fij)  is  likely  to  be  far  away  from  the  population  mean  fi.  Since  the 
Mahalanobls  distance  is  effectively  used  here  to  measure  the  distance  from  p, 
Xj  provides  only  a  global  diagnostic  for  outliers.  To  investigate  further  the 
specific  elements  of  0j  for  which  the  particular  individual  is  outlying,  one 
needs  to  examine  moment  summaries  or  graphical  displays  of  a^,  marginally 
or  Jointly  ming  the  generated  samples.  Figure  2  displays  the  box  plots  of 
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the  posterior  sample  values  of  the  A^’s,  for  each  of  the  IB  boys.  It  is  clear 
that  boy  21  is  likely  to  be  an  outlier.  Boy  number  24  is  not.  however,  as 
extreme.  Further  investigation  of  the  marginal  posterior  plots  of  and 
enables  us  to  conclude  that  boy  21  is  an  intercept  outlier,  vdiereas  boy  24  is 
a  possible  slope  outlier. 


Figure  2  Here 


To  compare  the  influence  of  the  outliers  on  the  overall  inferences,  the 
data  set  of  the  boys  was  reanalyzed  using  the  normal  model,  first  with  the 
full  data  set  (NO),  then  with  boy  21  removed  (Nl)  and  then  with  both  boys  21 
and  24  removed  (N2}.  The  median,  5X  and  95%  posterior  saiq>le  percentiles  of 
a^.  and  t  ^  (the  population  Intercept,  slope,  and  measurement  variance), 
along  with  the  sample  mean  of  the  population  covariance  matrix  Z  are 
s\iamarized  in  Table  3  for  the  various  models  for  the  boys,  and  for  the  St2 
model  for  the  girls.  The  boy  population  parameter  inferences  shift  in  the 
expected  directions  with  the  various  normal  models;  e.g. ,  the  Intercept  in  NO 
is  higher  than  that  of  Nl  and  N2.  The  population  variance  of  both  the 
intercept  and  slope  are  higher  in  the  NO  model  than  in  the  other  normal 
models.  The  results  for  the  St^  model  lie,  as  expected,  between  )«}  and  N2. 


Table  3  Here 


The  main  objective  in  this  problem  is  to  make  Inferences  concerning  the 
difference  in  dental  growth  between  boys  and  girls.  For  this  comparison,  the 
St^  model  was  chosen  for  both  groups.  Let  6(t)  be  the  difference  in  the 
dental  measurement  of  boys  and  girls  at  age  t,  given  by 

«(t)  -  (a^  +  9^(t-ll))  -  (ttg  ♦  . 

One  can  easily  obtain  posterior  samples  of  d(t)  by  direct  substitution  of  the 
corresponding  values  of  the  generated  samples  of  a^,  and  o^,  Figure  3 

displays  box  plots  of  the  differences  at  ages  8,  10,  12  and  14  years.  It  is 
quite  clear  that  the  boys  have  higher  dental  measurements  and  the  differences 
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become  Increasingly  larger  as  a  function  of  age. 


Figure  3  Here 


We  have  denonstated  here  that  the  Gibbs  sampler  provides  a  straight¬ 
forward  means  both  for  inference  summaries  and  for  diagnostic  checks.  Once 
the  basic  posterior  samples  are  obtained  for  the  original  model  parameters, 
the  effort  required  to  perform  additional  analyses  is  small. 


5.  ANALYSIS  OF  THE  NONLINEAR  PHARNACOKINEnC  EXANPLE 

We  begin  in  Section  5. 1  by  identifying  the  forms  of  the  full  conditionals 
for  the  power  model  specification  (2.2)  given  in  Section  2.2.  In  Section  5.2 
we  carry  out  an  suialysls  of  the  data  presented  in 'Table  2  and  discuss  the  two 
alternative  model  specifications  of  Sections  2.2. 

5.1  A  nonlinear  population  model 

Svtppose  that  the  first-stage  of  the  model  has  the  form 

/  "i  I  "l 

^  ‘’'ij'-i-  ■'i-  ’'’"OP  "'J'ylVSi’-  • 

vdiere  V^j(6p  is  a  nonlinear  function.  Suppose  further  that  the  second  and 
third-stage  structures  are  as  given  in  Section  4.1,  with,  additionally,  a 
\inifom  prior  for  r  over  some  suitably  assessed  interval  0  s  y  s  c,  with  c 
known.  As  in  that  section,  the  Gibbs  sampler  can  again  be  defined  by  the  full 
conditional  distributions 

?•  I* 

§'  if*  I* 

^■'iljf*  S*  if*  '^'^*  ^*  ^  1 . ^ 

®*  if*  5*  • 
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The  conditional  distributions  for  fi  and  Z  ^  are  identical  to  those  described 
in  Section  4.1.  The  conditional  distribution  for  is  similar  with 


.  r,  Tj.  j*  1} 


Ga 


Ti|JJ(Po  ♦  n^) 


X 


i/Si'’' 


For  each  i  ■  1,...,7,  the  conditional  distribution  of  the  p-vector  0^  is  given 
by 


a-  I*  Sj'  * 


"i  -  ..X  p  n. 

0  Ivi?)  "’[■’i 


T.  li 


V?i’ 


X  exp  -  p)^  -  p)J 


Ignoring  the  final  tern  gives  the  conditional  form  for  y  on  the  range  0  s  y  s 
c.  To  generate  from  the  conditional  distributions  of  0^  and  y  we  use  the 
generalized  ratlo-of~uniforms  technique  as  described  in  Uakefield,  Celfand 
and  Smith  (1991)  and  summarized  here  in  an  Appendix.  For  a  range  of  examples, 
this  method  has  been  found  to  be  considerably  more  reliable  than  the  normal 
approximation  rejection  techniques  used  in  a  related  setting  by  Zeger  and 
Karim  (1991). 


5.2  Illustrative  analysis 

Using  the  power  model  stmicture  outlined  in  Section  5. 1  the  data  in  Table 
2  was  analyzed  using  the  following  hyperameters 


F-  -  0.  C“^  -  0,  p  «  2  ,  R 


R  vas  chosen  in  the  following  manner.  Vfe  require  an  approximate  estimate  of 
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the  variance-covariance  aatrix  of  the  log  a  and  log  ^  population  distribution. 
The  off-di8igonal  element  is  chosen  to  be  zero.  Without  loss  of  generality 
consider  log  a.  In  many  pharmacokinetic  applications  we  have  some  idea  of  the 
magnitude  of  the  coefficient  of  variation  of  the  a*s.  Now  if  the  variance  of 
the  a’s  is  small  we  have 


log  a  «  log  £(a)  +  • 


and 


var  (log  a) 


var(tt) 

£(a)^ 


Consequently  the  square  of  the  coefficient  of  variation  gives  an  estimate  of 
the  variance  of  log  a.  In  this  example  the  coefficient  of  variation  for  both 
a  and  ^  was  estimated  to  be  30%. 


The  upper  bound  for  r.  c  was  chosen  to  be  5  in  this  example.  The  Gibbs 
strategy  was  as  follows.  Good  initial  estimates  were  found  and  from  these  10 
cycles  were  run  for  400  iterations.  Samples  were  extracted  from  each  cycle, 

10  iterations  apart  from  iteration  310  onwards  to  give  100  realizations  from 
the  posterior.  The  marginal  distribution  for  r  was  found  to  be  located  at 
approximately  0.6.  As  we  noted  in  Section  2.2  one  of  the  principal  aims  of 
stxidies  of  this  kind  is  the  computation  of  predictive  distributions  for 
concentrations.  The  model  described  in  Section  5. 1  was  found  to  be  Inadequate 
in  providing  such  predictive  distributions.  The  value  of  y  was  not  large 
enough  to  ensure  that  the  predictions  avoided  assigning  significant 
probabilities  to  negative  concentrations.  The  lognormal  intra- individual 
error  specification  described  in  Section  2.2  would,  of  course,  always  produce 
positive  predictive  concentrations.  However,  the  lognormal  model  also  does 

not  fit  these  data,  as  is  clear  from  the  following.  The  lognormal  model  y  * 

c  2  2 

D(6)e  with  c-N(0,  <r  )  can,  for  small  o*  ,  be  approximated  by 

y  »  i|(e)  (1  ♦  c)  , 

2  2 

yielding  E(Y)  »  i}(9)  and  Var(y)  »  tj  e  .  This  corresponds  to  y  >  2  in  our 
power  model  error  specification  .  Consequently,  if  the  lognormal  error  model 
were  correct  and  the  above  approximation  were  accurate  we  would  expect  the 
marginal  distribution  for  y  to  be  located  close  to  the  value  2.  For  error 
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variances  and  paraneter  values  similar  to  those  for  this  dataset,  lognormal 
data  %iere  simulated  and  the  marginal  distribution  for  7  was  indeed  found  to  be 
close  to  2  and  not  0.6  as  in  this  example.  We  conclude  that  the  lognormal 
specification  is  not  adequate  for  this  example  and  so  do  not  include  here 
details  of  the  Gibbs  sampler  conditional  forms.  Ue  note,  however,  that 
the  conditional  form  for  is  the  only  element  of  the  sampler  to  change  and 
generation  via  the  ratio-of-unifoms  method  is  again  possible. 

The  truncated  normal  model  outlined  in  Section  2.2  produces  the  following 
first  stage 

I  "i 

P  P  ^1-  j'l 


for  y^j  a  0.  Here  ^(. )  denotes  the  ciimulative  distribution  function  of  the 
standard  normal  distribution.  In  terms  of  the  Gibbs  sampler  for  the  power 
model,  specified  in  the  previous  Section,  the  conditional  distributions 
for  6,  T  and  7  are  affected  by  this  change.  For  0^  and  7  the  ratio-of- 
uniforms  can  still  be  used  though  there  is  an  additional  computational  expense 
in  the  numerical  calculation  of  •(.).  Previoiisly  the  generation  for  the  t^’s 
was  straightforward.  This  is  no  longer  the  case  so,  again,  the  ratio-of- 
uniforms  technique  (for  log  t^)  was  utilized  and  the  analysis  successfully 
implemented.  Figure  4  shows  a  predictive  distribution  for  concentration  for 
patient  2  at  32  hours. 


Figure  4  Here 


For  population  inferences,  we  note  the  following  interesting  lss\ie.  Our 
second  stage  assumption  is  that  the  (a^,  ^J)•pai^8  are  lognomally  distributed 
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To  sunnarize  the 

population  distribution  of,  say,  the  a^'s  vie  therefore  have  a  number  of 
options.  We  could  choose  to  calculate  the  mean,  the  mode,  the  median,  or  more 
generally  the  quantiles  of  order  q  of  the  distribution,  given,  respectively, 
by  exp(|ij  ♦  explp^  -  Zjj),  exp(p^),  and  explfx^  +  vdiere  is 

the  qviantile  of  order  q  of  an  tf(0,  1)  distribution.  For  a  posterior  sample 
from  the  marginal  distribution  of  and  we  can  easily  generate  any  of 
these  quantities.  Recall  that  vie  may  also  be  interested  in  the  clearance  Q 
and  the  half-live  X,  idiere  for  the  ith  Individual  Oj  «  and  »  log2/^J. 

It  is  straightforward  to  make  inferences  about  these  quantities  since  the 
(Qj,  A^)-pairs  also  have  a  lognormal  distribution  with  mean  logdog 

2)  -  Pg)  covariance  elements  (Zjj  *  ^  *  21^2'  ~  ^12  ~ 

5  shows  a  bivariate  plot  of  the  medians  for  Q  and  A,  with  corresponding 
(histogram)  density  estimates. 


with  mean  (iz^,  and  covariance  elements  ^^2* 


Figure  5  Here 


6.  DISCUSSION  AND  RELATION  TO  OTHER  WORK 

Two  alternative  approaches  for  approximating  posterior  distributions  for 
hierarchical  models,  such  as  those  of  Section  2,  are  the  EM-type  approxi¬ 
mations  (Raclne-Poon,  1985;  Raclne-Poon  and  Smith,  1990),  and  the  Laplace 
approximation  method  (see,  for  example,  Tierney  and  Kadane,  198S;  Kass  and 
Stef fey  1989). 

The  EM  type  approximation  treats  the  Individual  level  effects,  e^,  as 
missing  data  and  uses  the  EM  algorithm  to  obtain  the  node  of  the  Joint 
posterior  distribution  of  the  hyperparaneters  of  9^  (in  our  terminology,  the 
population  parameter  f).  The  marginal  posterior  density  for  a  population 
level  parameter  is  approximated  by  the  full  conditional  distribution  for  this 
parameter,  with  estimates  from  the  EM  algorithm  replacing  the  conditioning 
parameters.  Since  the  exact  marginal  posterior  distribution  is  a  continuous 
mixture  of  these  full  conditional  forms,  we  night  expect  such  an  aq>proxination 
to  be  poor  (see  Gelfand  et  eJ,  1990,  for  evidence  that  this  is  the  case). 
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More  specifically,  let  us  consider  the  nodels  in  Sections  2.1  and  2.2 
omitting  the  variance  power  transformation  for  simplicity.  For  the  student- 
linear  population  atodel  interest  would  often  focus  upon  the  posterior 
distribution  of  p.  One  attempt  at  using  the  EM  approximation  is  to  take  as 
the  estimates  of  the  posterior  distribution  of  p.  the  full  conditional  for  p, 
substituting  estimates  of  e,  Z  and  X  obtained  by  the  EM  algorithm.  An  E  step 
is  straightforward.  The  M  step,  however,  requires  a  maximization  over  p,  Z 
and  X  given  0,  and  with  more  parameters  than  data  this  maximum  is  unbounded. 
If  instead  we  ‘integrate  out  the  X^* ,  again  the  E  step  is  straightforward  but 
now.  without  conjugacy.  the  M  step  cannot  be  achieved  in  closed  form. 

A 

In  the  normal  nonlinear  model,  taking  0^  as  say,  the  MLE  or  a  nonlinear 
least  sqiiares  estimator  of  0^.  the  EM  approximation  replaces 


j-i 

A 

with  an  approximate  normal  distribution  for  based  upon  asymptotic  theory. 

The  remainder  of  the  model  specification  is  unchanged  so  that,  with  the 
resultant  conjugacy,  implementation  of  the  EM  algorithm  is  straightforward. 
However,  since  interesting  applications  tend  to  have  small  to  moderate  n^  the 
quality  of  the  normal  approximation  is  questionable. 

With  regard  to  the  Laplace  method  approach,  we  first  note  that  the  Joint 
posterior  distribution  of  all  the  parameters  is  proportional  to  likelihood  x 
prior.  Hence  any  expectation,  including  any  marginal  distributions,  can  be 
expressed  as  a  ratio  of  Integrals.  The  Laplace  method  i^proximates  both  the 
numerator  and  denominator  Integrals.  In  particular,  the  version  by  Tierney 
and  Kadane  (1986)  appears  to  offer  the  best  such  general  approximation  and  has 
been  tailored  in  Kass  and  Stef  fey  (1989)  for  application  to  exchangeable 
hierarchical  nodels. 

When  the  model  is  conjugate,  by  tdiich  we  mean  that  closed  form 
integration  over  0^  results,  the  Laplace  iq>proxination  nay  be  applied  without 
difficulty.  If  not,  as  with  the  Student  linear  and  normal  nonlinear  nodels, 
further  techniq\ies  are  needed.  Kass  and  Steffey  suggest  that,  if  the 
dimensionality  of  0  is  high,  perhaps  an  approximate  EM-type  method  might  be 
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used!  Ue  note  additionally  that,  in  implementing  the  Laplace  approximation, 
two  function  maximizations  will  always  be  required  and  at  least  one  additional 
maximization  will  be  required  for  each  different  expectation  that  is  sought. 
Also,  as  demonstrated  in  Achcar  and  Smith  (1990),  the  approximation  is  very 
sensitive  to  parameterization. 

The  Gibbs  sampling  approach  is  able  to  avoid  many  of  the  problems 
associated  with  the  above  alternative  approxlMitions,  and  appears  to  offer  the 
most  flexible  and  powerful  method  currently  available  for  the  routine  analysis 
of  challenging  population  model  problems. 
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Table  1. 


Measxirements  on  11  Girls  and  16  Bovs,  at  4  Different  Ages 


Girls  -  Age  In  Years 


Individual 

8 

10 

12 

14 

1 

21 

20 

21.5 

23 

2 

21 

21.5 

24 

25.5 

3 

20.5 

24 

24.5 

26 

4 

23.5 

24.5 

25 

26.5 

5 

21.5 

23 

22.5 

23.5 

6 

20 

21 

21 

22.5 

7 

21.5 

22.5 

23 

25 

8 

23 

23 

23.5 

24 

9 

20 

21 

22 

21.5 

10 

16.5 

19 

19 

19.5 

11 

24.5 

25 

28 

28 

Boys  >  Age  in  Years 

Individual 

8 

10 

12 

14 

12 

26 

25 

29 

31 

13 

21.5 

22.5 

23 

26.5 

14 

23 

22.5 

24 

27.5 

15 

25.5 

27.5 

26.5 

27 

16 

20 

23.5 

22.5 

26 

17 

24.5 

25.5 

27 

28.5 

18 

22 

22 

24.5 

26.5 

19 

24 

21.5 

24.5 

25.5 

20 

23 

20.5 

31 

26 

21 

27.5 

28 

31 

31.5 

22 

23 

23 

23.5 

25 

23 

21.5 

23.5 

24 

28 

24 

17 

24.5 

26 

29.5 

25 

22.5 

25.5 

25.5 

26 

26 

23 

24.5 

26 

30 

27 

22 

21.5 

23.5 

25 

23 


Table  2 


Pharmacokinetic  Data 


HOURS  AFTER  ADMINISTRATI(»I 


PATIENT 

2 

4 

6 

8 

10 

24 

28 

32 

1 

1.09 

0.75 

0.53 

0.34 

0.23 

0.02 

2 

2.03 

1.28 

1.20 

1.02 

0.83 

0.28 

3 

1.44 

1.30 

0.95 

0.68 

0.52 

0.06 

4 

1.55 

0.96 

0.80 

0.62 

0.46 

0.08 

5 

1.35 

0.78 

0.50 

0.33 

0.18 

0.02 

6 

1.08 

0.59 

0.37 

0.23 

0.17 

0.00 

7 

1.32 

0.74 

0.46 

0.28 

0.27 

0.03 

0.02 

0.00 

8 

1.63 

1.01 

0.73 

0.55 

0.41 

0.01 

0.06 

0.02 

9 

1.26 

0.73 

0.40 

0.30 

0.21 

0.00 

10 

1.30 

0.70 

0.40 

0.25 

0.14 

0.00 

24 


Table  3. 


Model 

■ean  of  £ 

BOYS 

24.65 

(23.96,  25.94) 

0.7781 

(0.5986,  0.9867) 

2.65 

(1.79,  3.87) 

1.7967 

-0.0229 

0.0742 

NO 

24.99 

(24.30.  26.92) 

0.7978 

(0.5952,  0.9931) 

2.90 

(1.86,  4.25) 

2.5470 

0. 0063 

0.0927 

N1 

24.67 

(23.84,  25.25) 

0.7999 

(0.6161,  0.9799) 

2.86 

(1.92,  3.79) 

1.5290 

-0.0021 

0.0919 

N2 

24.66 

(23.84,  25.28) 

0.7000 

(0.4771,  0.9098) 

2.55 

(1.68,  3.56) 

1.7041 

0.0234 

0.0719 

GIRLS 

^*2 

22.47 

(21.39,  23.38) 

0.4322 

(0.2175,  0.6085) 

0.49 

(0.28,  0.83) 

3. 1409 

0. 1070 

0.0594 

( )  denotes  a  90%  sample  interval 
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APPENDIX 


The  generalized  ratlo-of -uniforms  method 


Let  f(e)  denote  the  unnomalized  vinivariate  density  from  vdiich  \te  wish  to 
generate.  The  generalized  ratlo-of-unlfoms  method  Is  as  follows.  If  we 
generate  bivariate  points  In  the  region  C  defined  by 


C  « 


(u.  v) 


0  <  s*  f 


(A.  1) 


with  r  >  0,  then  the  resulting  ‘ rat lo-of -uniforms’  —  has  distribution  f/J*f. 

u 

The  efficiency  of  the  method  depends  crucially  upon  the  ease  with  idiich  we  can 
generate  points  within  the  region  C.  The  strategy  idilch  has  proved  most 
successful  Is  to  contain  C  within  a  rectangle  R  «  [0,  a]  x  [b~,  b^].  We 
define  a,  b  and  b*  shortly.  It  is  shown  In  Wakefield,  Gelfahd  and  Smith 
(1991)  that  the  probability  of  aicceptance  of  a  point  generated  in  N  is,  in 
general,  large  If  we  generate  instead  from  ^  «  0  -  0*,  tdiere  0*  is  the  mode  of 
/(0). 


The  aforementioned  paper  also  recommends  the  use  of  r  «  .5.  With  this 
value  and  the  ’mode-shift’  we  obtain  the  following  strategy. 

f 

1  Determine  a  *(max  f(0)'''V 

0 

2  Determine  b“  «  min  ^[/(^  +  0*)]*'''^*'  , 

^  s  0 

and  b*  «  max  . 

^  a  0 

3  Generate  u  -  1/(0,  a)  and  v  ~  U(b  ,  b  ).  Let  0  b  —  <*■  e  . 

u*" 

A 

4  If  0  is  not  contained  in  the  support  of  0  go  to  3. 

5  If  u  s  f(e)  then  accept  0,  otherwise  go  to  3. 

With  this  strategy,  typical  acceptance  probabilities  of  around  0.8  have 
resulted  for  a  range  of  models.  As  long  as  the  aaxima/nlnlma  defined  in  1  and 
2  exist  the  method  can  be  applied.  Apart  from  this  restriction  the  method  is 
completely  general  and  does  not,  for  example,  need  log-concavity  of  f,  as  per 
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the  adaptive  rejection  sampling  method  described  in  Gilks  and  Wild  (1992). 
The  price  of  this  generality  is,  of  coxirse,  the  need  to  carry  out  the 
maximisations/minimizations  in  order  to  find  the  bounding  rectangle. 
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Figure  1 :  least  squares  estimates  of  intercepts  and  slopes,  plotting 

symbol  Is  individual  number 


INTERCEPT  AT  11  YEARS 


Figure  2:  boxplots  of  lambda’s  for  the  boys.  Boxplots  are  drawn 
containing  5%,  10%.  50%,  90%  and  95%  of  samples 
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Figure  3:  Difference  between  growth  of  boys  and  girls  versus  age 
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